智能论文笔记

SOCRATES: A Stereo Camera Trap for Monitoring of Biodiversity

Timm Haucke , Hjalmar Kühl , Volker Steinhage

分类：计算机视觉 | 机器学习

2022-09-19

现代技术的开发和应用是对自然栖息地和景观中物种有效监测的重要基础，以追踪生态系统，物种社区和人群的发展，并分析变化的原因。为了使用诸如摄像头距离采样等方法估算动物丰度，根据3D（三维）测量的自然栖息地的空间信息至关重要。此外，3D信息可提高使用摄像头捕获的动物检测的准确性。这项研究为3D摄像头捕获提供了一种新颖的方法，该方法具有高度优化的硬件和软件。这种方法采用立体声愿景来推断自然栖息地的3D信息，并被指定为监测生物多样性（Socrates）的立体相机陷阱。对苏格拉底的全面评估不仅显示了$ 3.23 \％$的改善动物检测（边界盒$ \ text {map} _ {75} $），而且还可以使用相机陷阱距离采样来估算动物丰度。苏格拉底的软件和文档可在https://github.com/timmh/socrates上提供

translated by 谷歌翻译

Extreme Gradient Boosting for Yield Estimation compared with Deep Learning Approaches

Florian Huber , Artem Yushchenko , Benedikt Stratmann , Volker Steinhage

分类：机器学习

2022-08-26

在收获前的作物产量的准确预测对于世界各地的作物物流，市场计划和食物分配至关重要。产量预测需要在延长的时间段内监测物候和气候特征，以模拟农作物发育中涉及的复杂关系。绕过世界各种卫星提供的遥感卫星图像是获取数据预测数据的廉价且可靠的方法。目前，收益率预测的领域由深度学习方法主导。尽管使用这些方法达到的精度是有希望的，但所需的数据量和``Black-Box''性质可以限制深度学习方法的应用。可以通过提出一条管道将遥感图像处理为基于特征的表示形式来克服局限性，该图像允许使用极端梯度提升（XGBoost）进行产量预测。与基于深度学习的最先进的收益率预测系统相比，对美国大豆产量预测的比较评估显示出了有希望的预测准确性。特征重要性将近红外光谱视为我们模型中的重要特征。报告的结果暗示了XGBoost进行产量预测的能力，并鼓励将来对XGBoost进行XGBoost的实验，以对世界各地的其他农作物进行产量预测。

translated by 谷歌翻译

HTML版本

Do DALL-E and Flamingo Understand Each Other?

Hang Li , Jindong Gu , Rajat Koner , Sahand Sharifzadeh , Volker Tresp

分类：计算机视觉 | 机器学习

2022-12-23

A major goal of multimodal research is to improve machine understanding of images and text. Tasks include image captioning, text-to-image generation, and vision-language representation learning. So far, research has focused on the relationships between images and text. For example, captioning models attempt to understand the semantics of images which are then transformed into text. An important question is: which annotation reflects best a deep understanding of image content? Similarly, given a text, what is the best image that can present the semantics of the text? In this work, we argue that the best text or caption for a given image is the text which would generate the image which is the most similar to that image. Likewise, the best image for a given text is the image that results in the caption which is best aligned with the original text. To this end, we propose a unified framework that includes both a text-to-image generative model and an image-to-text generative model. Extensive experiments validate our approach.

translated by 谷歌翻译

Riemannian Optimization for Variance Estimation in Linear Mixed Models

Lena Sembach , Jan Pablo Burgard , Volker H. Schulz

分类： (统计)机器学习 | 机器学习

2022-12-18

Variance parameter estimation in linear mixed models is a challenge for many classical nonlinear optimization algorithms due to the positive-definiteness constraint of the random effects covariance matrix. We take a completely novel view on parameter estimation in linear mixed models by exploiting the intrinsic geometry of the parameter space. We formulate the problem of residual maximum likelihood estimation as an optimization problem on a Riemannian manifold. Based on the introduced formulation, we give geometric higher-order information on the problem via the Riemannian gradient and the Riemannian Hessian. Based on that, we test our approach with Riemannian optimization algorithms numerically. Our approach yields a higher quality of the variance parameter estimates compared to existing approaches.

translated by 谷歌翻译

Combining Planning, Reasoning and Reinforcement Learning to solve Industrial Robot Tasks

Matthias Mayr , Faseeh Ahmad , Konstantinos Chatzilygeroudis , Luigi Nardi , Volker Krueger

分类：机器人

2022-12-07

One of today's goals for industrial robot systems is to allow fast and easy provisioning for new tasks. Skill-based systems that use planning and knowledge representation have long been one possible answer to this. However, especially with contact-rich robot tasks that need careful parameter settings, such reasoning techniques can fall short if the required knowledge not adequately modeled. We show an approach that provides a combination of task-level planning and reasoning with targeted learning of skill parameters for a task at hand. Starting from a task goal formulated in PDDL, the learnable parameters in the plan are identified and an operator can choose reward functions and parameters for the learning process. A tight integration with a knowledge framework allows to form a prior for learning and the usage of multi-objective Bayesian optimization eases to balance aspects such as safety and task performance that can often affect each other. We demonstrate the efficacy and versatility of our approach by learning skill parameters for two different contact-rich tasks and show their successful execution on a real 7-DOF KUKA-iiwa.

translated by 谷歌翻译

Synthetic data enable experiments in atomistic machine learning

John L. A. Gardner , Zoé Faure Beaulieu , Volker L. Deringer

分类：机器学习

2022-11-29

Machine-learning models are increasingly used to predict properties of atoms in chemical systems. There have been major advances in developing descriptors and regression frameworks for this task, typically starting from (relatively) small sets of quantum-mechanical reference data. Larger datasets of this kind are becoming available, but remain expensive to generate. Here we demonstrate the use of a large dataset that we have "synthetically" labelled with per-atom energies from an existing ML potential model. The cheapness of this process, compared to the quantum-mechanical ground truth, allows us to generate millions of datapoints, in turn enabling rapid experimentation with atomistic ML models from the small- to the large-data regime. This approach allows us here to compare regression frameworks in depth, and to explore visualisation based on learned representations. We also show that learning synthetic data labels can be a useful pre-training task for subsequent fine-tuning on small datasets. In the future, we expect that our open-sourced dataset, and similar ones, will be useful in rapidly exploring deep-learning models in the limit of abundant chemical data.

translated by 谷歌翻译

PhotoFourier: A Photonic Joint Transform Correlator-Based Neural Network Accelerator

Shurui Li , Hangbo Yang , Chee Wei Wong , Volker J. Sorger , Puneet Gupta

分类：机器学习

2022-11-10

The last few years have seen a lot of work to address the challenge of low-latency and high-throughput convolutional neural network inference. Integrated photonics has the potential to dramatically accelerate neural networks because of its low-latency nature. Combined with the concept of Joint Transform Correlator (JTC), the computationally expensive convolution functions can be computed instantaneously (time of flight of light) with almost no cost. This 'free' convolution computation provides the theoretical basis of the proposed PhotoFourier JTC-based CNN accelerator. PhotoFourier addresses a myriad of challenges posed by on-chip photonic computing in the Fourier domain including 1D lenses and high-cost optoelectronic conversions. The proposed PhotoFourier accelerator achieves more than 28X better energy-delay product compared to state-of-art photonic neural network accelerators.

translated by 谷歌翻译

Satellite Image Search in AgoraEO

Ahmet Kerem Aksoy , Pavel Dushev , Eleni Tzirita Zacharatou , Holmer Hemsen , Marcela Charfuelan , Jorge-Arnulfo Quiané-Ruiz , Begüm Demir , Volker Markl

分类：计算机视觉

2022-08-23

全球地球观察（EO）的运营能力不断增长为数据驱动的方法创造了新的机会，以理解和保护我们的星球。但是，由于巨大的档案尺寸和EO平台提供的有限的勘探功能，目前使用EO档案的使用受到了极大的限制。为了解决这一限制，我们最近提出了米兰，这是一种基于内容的图像检索方法，用于在卫星图像档案中快速相似性搜索。米兰是基于公制学习的深层哈希网络，将高维图像特征编码为紧凑的二进制哈希码。我们将这些代码用作哈希表中的钥匙，以实现实时邻居搜索和高度准确的检索。在此演示中，我们通过将米兰与Agoraeo内的浏览器和搜索引擎集成在一起来展示米兰的效率。地震支持卫星图像存储库上的交互式视觉探索和典型查询。演示访问者将与地震互动，扮演不同用户的角色，这些用户的角色通过其语义内容搜索图像，并通过其语义内容搜索并应用其他过滤器。

translated by 谷歌翻译

InstanceFormer: An Online Video Instance Segmentation Framework

Rajat Koner , Tanveer Hannan , Suprosanna Shit , Sahand Sharifzadeh , Matthias Schubert , Thomas Seidl , Volker Tresp

分类：计算机视觉

2022-08-22

最近的基于变压器的离线视频实例细分（VIS）方法取得了令人鼓舞的结果，并明显胜过在线方法。但是，它们对整个视频的依赖以及由全时空的注意力引起的巨大计算复杂性限制了它们在现实生活中的应用中，例如处理冗长的视频。在本文中，我们提出了一个基于单级变压器的高效在线VIS框架，名为InstanceFormer，该框架特别适合长期挑战性的视频。我们提出了三个新的组件来建模短期和长期依赖性和时间连贯性。首先，我们传播了对短期更改建模的先前实例的表示形式，位置和语义信息。其次，我们在解码器中提出了一种新颖的记忆交叉注意，该记忆使网络可以在某个时间窗口内研究早期实例。最后，我们采用时间对比度损失，在所有框架的实例表示中施加连贯性。记忆注意力和时间连贯性特别有益于远程依赖建模，包括诸如遮挡等挑战的情况。所提出的实例形式优于以前的在线基准方法在多个数据集上的较大边距。最重要的是，InstanceFormer超过了挑战和长数据集（例如YouTube-Vis-2021和OVIS）的离线方法。代码可从https://github.com/rajatkoner08/instanceformer获得。

translated by 谷歌翻译

Multi-Attribute Open Set Recognition

Piyapat Saranrittichai , Chaithanya Kumar Mummadi , Claudia Blaiotta , Mauricio Munoz , Volker Fischer

分类：计算机视觉

2022-08-14

通过同时对已知类别进行分类并识别未知类别，将图像分类扩展到开放世界设置。尽管常规的OSR方法可以检测到分布（OOD）样本，但它们无法提供说明，表明哪些基本视觉属性（例如，形状，颜色或背景）导致特定样本未知。在这项工作中，我们介绍了一个新的问题设置，该设置将常规OSR推广到一个多属性设置，其中同时识别了多个视觉属性。在这里，不仅可以识别OOD样本，而且可以按其未知属性进行分类。我们提出了简单的常见OSR基线的扩展，以处理这种新颖的情况。我们表明，当培训数据集中存在虚假相关性时，这些基准很容易受到捷径。这导致了OOD性能差，根据我们的实验，这主要是由于预测的置信度得分的意外交叉分类相关性。我们提供了一个经验证据，表明这种行为在合成和现实世界数据集的不同基准之间是一致的。

translated by 谷歌翻译